Dental data mining: potential pitfalls and practical issues.

نویسنده

  • S A Gansky
چکیده

Knowledge Discovery and Data Mining (KDD) have become popular buzzwords. But what exactly is data mining? What are its strengths and limitations? Classic regression, artificial neural network (ANN), and classification and regression tree (CART) models are common KDD tools. Some recent reports (e.g., Kattan et al., 1998) show that ANN and CART models can perform better than classic regression models: CART models excel at covariate interactions, while ANN models excel at nonlinear covariates. Model prediction performance is examined with the use of validation procedures and evaluating concordance, sensitivity, specificity, and likelihood ratio. To aid interpretation, various plots of predicted probabilities are utilized, such as lift charts, receiver operating characteristic curves, and cumulative captured-response plots. A dental caries study is used as an illustrative example. This paper compares the performance of logistic regression with KDD methods of CART and ANN in analyzing data from the Rochester caries study. With careful analysis, such as validation with sufficient sample size and the use of proper competitors, problems of naïve KDD analyses (Schwarzer et al., 2000) can be carefully avoided.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Mining : Potential Pitfalls and Practical Issues

I Abstract — Knowledge Discovery and Data Mining (KDD) have become popular buzzwords. But what exactly is data mining? What are its strengths and limitations? Classic regression, artificial neural network (ANN), and classification and regression tree (CART) models are common KDD tools. Some recent reports (e.g., Kattan et al., 1998) show that ANN and CART models can perform better than classic ...

متن کامل

Data Mining in the Real World: Experiences, Challenges, and Recommendations

Data mining is used regularly in a variety of industries and is continuing to gain in both popularity and acceptance. However, applying data mining methods to complex real-world tasks is far from straightforward and many pitfalls face data mining practitioners. However, most research in the field tends to focus on the algorithmic issues that arise in data mining and ignores the human element an...

متن کامل

Methodological and practical aspects of data mining

We describe the different stages in the data mining process and discuss some pitfalls and guidelines to circumvent them. Despite the predominant attention on analysis, data selection and pre-processing are the most time-consuming activities, and have a substantial in ̄uence on ultimate success. Successful data mining projects require the involvement of expertise in data mining, company data, and...

متن کامل

The Perils and Pitfalls of Mining SourceForge

SourceForge provides abundant accessible data from Open Source Software development projects, making it an attractive data source for software engineering research. However it is not without theoretical peril and practical pitfalls. In this paper, we outline practical lessons gained from our spidering, parsing and analysis of SourceForge data. SourceForge can be practically difficult: projects ...

متن کامل

Data Mining and XML: Current and Future Issues

The paper describes potential synergies between data mining and XML, which include the representation of discovered data mining knowledge, knowledge discovery from XML documents, XML-based data preparation, and XML-based domain knowledge. Each category is viewed from a theoretical as well as a practical point of view.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Advances in dental research

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2003